12 research outputs found
Real-time performance diagnosis and evaluation of big data systems in cloud datacenters
PhD ThesisModern big data processing systems are becoming very complex in terms of largescale, high-concurrency and multiple talents. Thus, many failures and performance
reductions only happen at run-time and are very difficult to capture. Moreover, some
issues may only be triggered when some components are executed. To analyze the root
cause of these types of issues, we have to capture the dependencies of each component
in real-time.
Big data processing systems, such as Hadoop and Spark, usually work in large-scale,
highly-concurrent, and multi-tenant environments that can easily cause hardware and
software malfunctions or failures, thereby leading to performance degradation. Several systems and methods exist to detect big data processing systems’ performance
degradation, perform root-cause analysis, and even overcome the issues causing such
degradation. However, these solutions focus on specific problems such as stragglers and
inefficient resource utilization. There is a lack of a generic and extensible framework
to support the real-time diagnosis of big data systems.
Performance diagnosis and prediction of big data systems are highly complex as these
frameworks are typically deployed in cloud data centers that are large-scale, highly
concurrent, and follows a multi-tenant model. Several factors, including hardware
heterogeneity, stochastic networks and application workloads may impact the performance of big data systems. The current state-of-the-art does not sufficiently address
the challenge of determining complex, usually stochastic and hidden relationships between these factors.
To handle performance diagnosis and evaluation of big data systems in cloud environments, this thesis proposes multilateral research towards monitoring and performance
diagnosis and prediction in cloud-based large-scale distributed systems by involving a
novel combination of an effective and efficient deployment pipeline.The key contributions of this dissertation are listed below:
- i -
• Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource
utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs).
• Developing AutoDiagn, an automated real-time diagnosis framework for big data
systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online
root-cause analysis for a big data system.
• Designing a novel root-cause analysis technique/system called BigPerf for big
data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex
relationships between performance related factors.
The key contributions of this dissertation are listed below:
- i -
• Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource
utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs).
• Developing AutoDiagn, an automated real-time diagnosis framework for big data
systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online
root-cause analysis for a big data system.
• Designing a novel root-cause analysis technique/system called BigPerf for big
data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex
relationships between performance related factors.
The key contributions of this dissertation are listed below:
- i -
• Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource
utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs).
• Developing AutoDiagn, an automated real-time diagnosis framework for big data
systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online
root-cause analysis for a big data system.
• Designing a novel root-cause analysis technique/system called BigPerf for big
data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex
relationships between performance related factors.State of the Republic of Turkey and the Turkish Ministry
of National Educatio
RootPath: Root Cause and Critical Path Analysis to Ensure Sustainable and Resilient Consumer-Centric Big Data Processing under Fault Scenarios
The exponential growth of consumer-centric big data has led to increased concerns regarding the sustainability and resilience of data processing systems, particularly in the face of fault scenarios. This paper presents an innovative approach integrating Root Cause Analysis (RCA) and Critical Path Analysis (CPA) to address these challenges and ensure sustainable, resilient consumer-centric big data processing. The proposed methodology enables the identification of root causes behind system faults probabilistically, implementing Bayesian networks. Furthermore, an Artificial Neural Network (ANN)-based critical path method is employed to identify the critical path that causes high makespan in MapReduce workflows to enhance fault tolerance and optimize resource allocation. To evaluate the effectiveness of the proposed methodology, we conduct a series of fault injection experiments, simulating various real-world fault scenarios commonly encountered in operational environments. The experiment results show that both models perform very well with high accuracies, 95%, and 98%, respectively, enabling the development of more robust and reliable consumer-centric systems
Federated-ANN based Critical Path Analysis and Health Recommendations for MapReduce Workflows in Consumer Electronics Applications
Although much research has been done to improve the performance of big data systems, predicting the performance degradation of these systems quickly and efficiently remains a significant challenge. Unfortunately, the complexity of big data systems is so vast that predicting performance degradation ahead of time is quite tricky. Long execution time is often discussed in the context of performance degradation of big data systems. This paper proposes MrPath, a Federated AI-based critical path analysis approach for holistic performance prediction of MapReduce workflows for consumer electronics applications while enabling root-cause analysis of various types of faults. We have implemented a federated artificial neural network (FANN) to predict the critical path in a MapReduce workflow. After the critical path components (e.g., mapper1, reducer2) are predicted/detected, root cause analysis uses user-defined functions (UDF) to pinpoint the most likely reasons for the observed performance problems. Finally, health node classification is performed using an ANN-based Self-Organising Map (SOM). The results show that the AI-based critical path analysis method can significantly illuminate the reasons behind the long execution time in big data systems
MapChain: A Blockchain-based Verifiable Healthcare Service Management in IoT-based Big Data Ecosystem
Internet of Things (IoT)-based Healthcare services, which are becoming more widespread today, continuously generate huge amounts of data which is often called big data. Due to the magnitude and intricacy of the data, it is difficult to find valuable information that can be used for decision-making and prediction. Big data systems take on a significant infrastructure service to better serve the purpose of IoT systems and support critical decision making. On the other hand, privacy preservation, data integrity, and identity verification are essential requirements in healthcare big data service management. To overcome these problems, this article offers a scalable computing system that provides verifiable data access mechanism for IoT-enabled health data analytics in the big data ecosystem. There are two primary sub-architectures in the proposed architecture, namely a big data analytics tracking system and a derived blockchain-based data storage/access system. This approach leverages big data systems and blockchain architecture to analyze, and securely store data from IoT-enabled devices and allow verified access to the stored data. The zero-knowledge protocol is used to ensure that no information is accessible to unauthenticated users alongside avoiding data linkability. The results demonstrate the effectiveness of the our method to solve the problems of big data analytics and privacy issues in healthcare